Towards Exascale Scientific Metadata Management

نویسندگان

  • Spyros Blanas
  • Surendra Byna
چکیده

Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadataoblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pantheon: Exascale File System Search for Scientific Computing

Modern scientific computing generates petabytes of data in billions of files that must be managed. These files are often organized, by name, in a hierarchical directory tree common to most file systems. As the scale of data has increased, this has proven to be a poor method of file organization. Recent tools have allowed for users to navigate files based on file metadata attributes to provide m...

متن کامل

Towards Exascale Distributed Data Management

Exascale eScience infrastructures” will face important and critical challenges both from computational and data perspectives. Increasingly complex and parallel scientific codes will lead to the production of huge amount of data. The large volume of data and the time needed to locate, access, analyze and visualize data will greatly impact on the scientific productivity of scientists and research...

متن کامل

Metadata Workloads for Testing Big Storage Systems

Efficient namespace metadata management is becoming more important as next-generation file systems are designed for the peta and exascale era. A number of newmetadata management schemes have been proposed. However, evaluation of these designs has been insufficient, mainly due to a lack of appropriate namespace metadata traces. Specifically, no Big Data storage system metadata trace is publicly ...

متن کامل

Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems

Many believe that the state-of-the-art yet decades old high-performance computing (HPC) storage would not meet the I/O requirement of the emerging exascale mainly due to the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the efficiency and availability would go towards zero as the system scales approach exascale. This work proposes a new arch...

متن کامل

Storage Support for Data-Intensive Applications on Large Scale High-Performance Computing Systems

Many believe that the state-of-the-art yet decades old high-performance computing (HPC) storage would not meet the I/O requirement of the emerging exascale mainly due to the segregation of compute and storage resources. Indeed, our simulation predicts, quantitatively, that the efficiency and availability would go towards zero as the system scales approach exascale. This work proposes a new arch...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1503.08482  شماره 

صفحات  -

تاریخ انتشار 2015